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Abstract 

Background: Setosphaeria turcica is a fungal pathogen that causes northern corn leaf blight (NCLB) which is a 
serious foliar disease in maize. In order to unravel the genetic architecture of the resistance against this disease, a vast 
association mapping panel comprising 1 487 European maize inbred lines was used to (i) identify chromosomal 
regions affecting flowering time (FT) and northern corn leaf blight (NCLB) resistance, (ii) examine the epistatic 
interactions of the identified chromosomal regions with the genetic background on an individual molecular marker 
basis, and (iii) dissect the correlation between NCLB resistance and FT. 

Results: The single marker analyses performed for 8 244 single nucleotide polymorphism (SNP) markers revealed 
seven, four, and four SNP markers significantly {a = 0.05, amplicon wise Bonferroni correction) associated with FT, 
NCLB, and NCLB resistance corrected for FT, respectively. These markers explained individually between 0.36 and 
14.29% of the genetic variance of the corresponding trait. 

Conclusions: The very well interpretable pattern of SNP associations observed for FT suggested that data from 
applied plant breeding programs can be used to dissect polygenic traits. This in turn indicates that the associations 
identified for NCLB resistance might be successfully used in marker-assisted selection programs. Furthermore, the 
associated genes are also of interest for further research concerning the mechanism of resistance to NCLB and plant 
diseases in general, because some of the associated genes have not been mentioned in this context so far. 



Background 

Setosphaeria turcica (anamorph Exserohilum turcicum, 
formerly known as Helminthosporium turcicum) is a fun- 
gal pathogen that causes northern corn leaf blight (NCLB) 
in maize. NCLB is a serious, omnipresent foliar disease 
[1,2]. Infections of maize with NCLB before silking can 
cause grain yield losses of more than 50%, which are 
accompanied by a reduction in feed value and the predis- 
position of infected plants to stalk rot [3]. 

Plants have evolved qualitative and quantitative resis- 
tance to combat pathogens. Qualitative resistance 
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typically confers a high level of resistance, is usually race 
specific, and is based on single, mostly dominantly acting 
genes (R genes; for review see [4]). For NCLB, qualitative 
resistances have been identified and called Ht genes (for 
Helminthosporium turcicum): Htl [5] and HtP [6] were 
mapped to the long arm of chromosome 2, Ht2 [7] as 
well as Htnl [8] were mapped to the long arm of chro- 
mosome 8, and Ht3 was the only resistance gene that was 
ever introgressed from Tripsacum floridanum into maize 
[9]. These single resistance genes have been backcrossed 
into a number of widely used inbred lines, where they 
showed partial dominance and expression dependent on 
the genetic background [10]. Furthermore, the expression 
of the Ht genes is modified by the environment, partic- 
ularly temperature and light intensity [11]. In addition, 
qualitative resistances conferred by single genes such as 
the Ht genes tend to be overcome by new, virulent races 
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of Setosphaeria turcica e.g. [12,13]. All these aspects limit 
the practical value of the Ht genes and have hampered 
their use in maize breeding programs. 

Quantitative resistances are considered to be oligo- or 
polygenically inherited and, thus, partially as well as mod- 
eratly effective, but race unspecific and durable (for review 
see [14]). Due to the latter two properties, quantitative 
resistances are today considered more useful in a breeding 
context than qualitative resistances. In agrement with this 
conclusion, the majority of disease resistances deployed in 
elite varieties of maize are quantitative. However, identifi- 
cation of genes confering quantitative resistance is much 
more challenging than identifying R genes, owing to their 
smaller phenotypic effects. 

Various studies have been conducted to map quantita- 
tive trait loci (QTLs) for resistance to NCLB (for review 
see [15]). All of them were linkage mapping studies using 
different types of progenies such as F2 or F3 genera- 
tions, BC\ generations, or populations of near isogenic 
lines or recombinant inbred lines. In these studies, QTLs 
were detected on all maize chromosomes except chro- 
mosome ten. Due to the large confidence intervals of 
QTLs and a restricted allelic sampling in the two parental 
genotypes, however, the results of linkage mapping stud- 
ies had so far little impact on resistance breeding. Very 
recently, NCLB resistance in maize was dissected using 
the nested asociation mapping (NAM) population [16], 
which offers the advantage of a higher mapping resolution 
and a broader allelic sampling than the above mentioned 
linkage mapping studies. Nevertheless, population-based 
association mapping has the potential of resulting in an 
even higher mapping resolution and broader allelic sam- 
pling compared to NAM [17]. To our knowledge, however, 
no genome-wide population-based association mapping 
study has been yet conducted for NCLB resistance 
in maize. 

Resistance genes identified by linkage or association 
mapping might affect the disease either directly or indi- 
rectly (cf. [18,19]). Genes affecting plant growth and 
development or time to flowering (FT) fall in the lat- 
ter class. Especially for diseases caused by necrotrophic 
pathogens such as Setosphaeria turcica, which are more 
severe on senescing leaf tissue after anthesis, a relation- 
ship betweeen plant disease resistance and FT might be 
expected [20]. Despite the contradictory results from ear- 
lier phenotypic analyses {e.g. [21,22]), some QTLs for 
NCLB resistance found in meta-analyses colocalized with 
those for FT and maturity (for review see [15]). How- 
ever, in contrast to these linkage mapping studies, our 
association analysis will allow to discriminate with a high 
mapping resolution between pleiotropy and linkage of 
QTL for NCLB resistance and FT {cf. [23]). 

In this study, a large association mapping panel compris- 
ing 1487 elite maize inbred lines was used to (i) identify 



Table 1 First and second-degree statistics for flowering 
time (FT) and northern corn leaf blight (NCLB) resistance of 
the 1487 phenotyped and genotyped inbred lines 



Trait 



Parameter 


FT 


NCLB 


mean(M/) 


55.2 


4.8 


range(M/) 


34.0 - 73.5 


0.5 - 10.0 


-I 


42.34 


3.24 


°h 


2.21 


0.63 


h 2 


0.95 


0.87 



FT is measured in number of days to female flowering after June 1 . 
NCLB is rated from 1-9 (sensitive-resistant). 

Mj is the adjusted entry mean of genotype /'calculated across all environments, 
erj and <7q E are the genotypic and genotype x environment interaction 
variances. 

h 2 is the heritability on entry an mean basis. 

chromosomal regions affecting FT and NCLB resistance, 
(ii) examine the epistatic interactions of the identified 
chromosomal regions with the genetic background on an 
individual molecular marker basis, and (iii) dissect the 
correlation between NCLB resistance and FT. 

Results 

For the whole set of phenotyped and genotyped inbred 
lines, the heritability of FT and NCLB resistance was 0.95 
and 0.85, respectively (Table 1). NCLB was significantly 
(r = 0.53, a = 0.05) correlated with FT (Figure 1) in the 
whole set of 1487 inbred lines. The Pearson correlation 
coefficient was lower within the four heterotic pools and 
ranged from 0.27 (Stiff Stalk; SSS) to 0.33 (Flint) (Figure 1). 
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Figure 1 Regression curves of the adjusted entry means of 
flowering time (FT) vs. northern corn leaf blight resistance 
(NCLB) for the entire set of 1487 maize inbred lines as well as the 
individual heterotic pools, r is Pearson's correlation coefficient 
between the two traits. 
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Figure 2 Observed vs. expected P values for different two-step 
association mapping methods of northern corn leaf blight 
resistance with simple sequence repeat (SSR) and single 
nucleotide polymorphism (SNP) markers. 



For the SSR markers, the observed P values obtained 
with the QK and K model showed in comparison to the 
ANOVA and the Q model a smaller deviation from the 
uniform distribution (Figure 2). Furthermore, the mean 
squared difference (MSD) between observed and expected 
P values was slightly smaller for the QK model than for 
the K model In addition to the SSRs, this was also true 
for the SNP markers (0.041 versus 0.042; 0.005 versus 
0.007, respectively). The population background structure 
accounted for 21, 6, and 2% of the genetic variation in FT, 
NCLB and NCLB resistance corrected for FT (NCLB^r), 
respectively. 

In single marker analyses, seven, four, and four SNP 
markers were significantly (a = 0.05, amplicon wise 
Bonferroni correction) associated with FT, NCLB, and 
NCLBp r resistance, respectively (Figure 3). For FT, the 
seven SNPs explained individually between 5.39 and 
14.29% of the genetic variance, whereas all SNPs together 
explained 13.20% (Table 2). For NCLB and NCLB Fr , the 
four SNPs explained between 3.32 to 4.78% and between 
0.36 to 6.76% of the genetic variance, respectively. In a 
simultaneous fit, they explained 8.18 and 9.48% of the 
genetic variance of NCLB and NCLB^r, respectively. 

In the Flint, Lancaster, SSS, and Iodent pool, two, 
four, two, and six SNPs were significantly (a = 0.05, 
amplicon wise Bonferroni correction) associated with 
FT (Additional file 1: Figure SI), which explained in a 
simultaneous fit 1.87, 22.99, 21.35, and 25.50% of the 
genetic variance in the corresponding heterotic pools 
(Table 3). For NCLB, two and six significantly associated 
SNP markers were identified in the SSS and Iodent pool, 
respectively, but none for the Flint and Lancaster pools 



(Additional file 2: Figure S2). Similarly, one and three SNPs 
were found to be significantly associated with NCLBf j in 
the SSS and Iodent pool, respectively (Additional file 3: 
Figure S3). The SNPs associated with NCLB explained in 
a simultaneous fit 9.38 and 28.94% of the genetic variance, 
whereas those associated with NCLB^r explained 0 and 
23.20% of the genetic variance in the SSS and Iodent pool, 
respectively (Table 3). 

The three rounds of multiple forward regression 
revealed for the whole set of 1487 inbred lines three SNP 
markers to be significantly associated with FT and NCLB, 
but only two with NCLB^r (Table 4). The simultaneous 
fit of these SNPs explained 16.65, 7.62, and 6.13% of the 
genetic variance of FT, NCLB, and NCLB^ r , respectively. 
Significant (a = 0.05, amplicon wise Bonferroni correc- 
tion) epistatic interactions were identified between the 
significant SNPs from the single marker analyses as well 
as the multiple forward regression procedure and all other 
SNPs for FT and NCLB resistance, respectively (Figure 4). 
No significant epistatic interactions were detected for 
NCLBfr. The epistatic interactions found for the two 
traits explained a maximum of 5% of the genetic variance 
(Additional file 4: Figure S4). 

Discusssion 

Statistical aspects of association analysis 
One-step vs. Two-step approaches 

In all genetic mapping experiments, the one-step 
approach, in which phenotypic and genotypic data are 
analysed in a single step, is the only fully efficient analy- 
sis [24]. However, a comparison with the two-step analysis 
showed only a marginal increase in the empirical type I 
error rate [25]. As the two-step analysis is computation- 
ally much less demanding, we used this approach in view 
of the large data set analysed in our study. 

Alternative association mapping models 

Several methods for association analysis in plants have 
been described recently [25-27]. In order to identify the 
most appropriate association mapping method for our 
data set, we compared for background SSR markers sev- 
eral models with respect to the deviation of the P values 
from a uniform distribution [28]. This is because under 
the assumption that our SSR markers are unlinked to 
functional polymorphism due to their low genome cover- 
age [29], it is expected that the P values observed for an 
association mapping approach are uniformly distributed 
(cf. [26]). The mean of squared difference (MSD) between 
observed and expected P values of all marker loci was 
therefore calculated as a measure for the deviation of the 
P values from a uniform distribution. The results of these 
analyses (Figure 2) suggested that the QK method [26] 
with kinship matrix K calculated as the fraction of shared 
alleles [27] was the most appropriate method for our data 
set with respect to the adherence to the nominal a level. 
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Figure 3 Genome-wide P values for association analysis of flowering time (FT; A), northern corn leaf blight (NCLB; B), and FT corrected 
NCLB resistance (C) for the entire set of 1 487 maize inbred lines. The ten colors represent the ten chromosomes. The horizontal, doted and 
dashed-doted lines correspond to a nominal 5% significance threshold with Bonferroni and amplicon-wise Bonferroni correction, respectively. 
Significant P values are represented by a star. 
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Table 2 Single nucleotide polymorphism (SNP) marker loci significantly associated with flowering time (FT), northern 
corn leaf blight (NCLB), and FT corrected NCLB (NCLB/t) resistance for the entire set of 1 487 maize inbred lines as 
identified by single marker analysis 





Marker 




Chr. 


Position 


Position 




Allele 


Effect 


Pg 


Trait 


locus 


Gene 


bin 


(cM) 


(bp) 


P value 


1/2 


Allele 1-2 


(%) 


FT 


M_00048149 


LE00126 


8.05 


187.40 


128429853 


1 .5e-06** 


G/A 


1.66 


10.72 




M_00041827 


AY1 04033 


8.05 


188.17 


130730508 


6.8e-08** 


en 


2.21 


9.75 




M_00041828 


AY1 04033 


8.05 


188.17 


130730581 


6.6e-08** 


C/J 


2.21 


9.81 




M_00044984 


CL4016 


8.05 


191.34 


131678990 


3.8e-06** 


T/C 


1.49 


6.34 




M .00044985 


CL4016 


8.05 


191.34 


131678953 


8.7e-06* 


A/G 


1.43 


5.39 




M_00049486 


LE00214 


8.05 


197.92 


145084250 


3.0e-08** 




-2.03 


14.29 




M_00049487 


LE00214 


8.05 


197.92 


145084209 


3.4e-07** 


C/A 


-1.66 


9.10 




Simultaneous fit 
















13.20 


NCLB 


M.00040077 


AY1 05483 


2.08 


343.92 


217079800 


1 .4e-05* 


G/A 


-0.54 


3.32 




IVLUUU4 I ozU 


AV1 1 

AY I \ ZZ\0 


j.Ud 


1 qc m 
I oj.Uz 


0/oUUzUd 




r rr 
L/ 1 


-U.jU 


3 CO 
J.JO 




M_00043741 


AY111579 


6.05 


146.44 


145332550 


1 .3e-05* 


C/J 


0.61 


4.27 




M_00048400 


LE00018 


7.02 


178.00 


100239763 


3.3e-06** 


T/C 


-0.47 


4.78 




Simultaneous fit 
















8.18 


NCLB FT 


M_00045254 


AY1 07778 


7.04 


391.46 


169937062 


2.4e-07** 


T/C 


-1.89 


0.36 




M_00041355 


AY1 07035 


9.03 


99.76 


43679500 


1 .4e-05* 


A/C 


-0.58 


6.76 




M_00041356 


AY1 07035 


9.03 


99.76 


43679530 


9.3e-06* 


G/A 


-0.58 


6.66 




M.00039528 


AY104217 


9.05 


1 74.09 


135147555 


9.1e-06* 


A/G 


-0.72 


4.04 




Simultaneous fit 
















9.48 



Pg is the proportion of the explained genotypic variance. 
"Significant at P<0.05 with amplicon-wise Bonferroni correction. 
""Significant at P<0.05 with Bonferroni correction. 



times at the significance level a. Across all tests, however, 
the experimental type I error rate will be much higher 
than a (e.g. [30]). To overcome this problem and obtain an 
appropriate significance threshold, it was recommended 
to apply the Bonferroni correction [31], where the a level 
is divided by the number of independent tests. How- 
ever, determining the number of independent tests is not 
straight forward in the context of genome-wide associa- 
tion mapping studies. Owing to the correlation structure 
among markers, it would be overly conservative to use the 
total number of markers as a substitute for the number of 
independent tests [32]. As the 8 244 SNP markers of our 
study were derived from 2 973 amplicons and SNPs from 
the same amplicon tend to show higher correlations than 
SNPs from different amplicons [29], we used besides the 
total number of SNPs also the number of amplicons as 
correction factor for the Bonferroni procedure. 

Single marker analysis vs. multiple forward regression 

An efficient approach to identify significant marker- 
phenotype associations inspite of the collinearity between 
markers might be multiple forward regression (cf. [33]). 
We applied this approach in the context of mixed-model 
analyses and detected SNPs that have not been detected 
with the single marker analysis (Table 4). Furthermore, 



The use of the QK method for the SNP-phenotype asso- 
ciation analysis, however, resulted in fewer associations 
compared to the K method (data not shown). Because it 
is not possible to determine whether these associations 
were lost due to the lower power of the QK method or as 
they are caused by population structure, we decided for 
the conservative way and discussed below only the results 
of the QK method. 

An alternative to single marker analysis is haplotype- 
based association analysis. This requires the building of 
haplotypes based on the extent of LD between the sin- 
gle markers. In the germplasm examined in our study, the 
average extent of LD between SNPs within amplicons var- 
ied from r 2 = 0.253 to r 2 = 0.304, depending on the 
heterotic pools investigated [29]. In the case of such rel- 
atively low levels of LD, the number of haplotypes per 
amplicon is high and therefore their frequencies low. This 
in turn leads to a low power for detecting associations by a 
haplotype-based analysis. Therefore, we think haplotype- 
based association mapping is no promising strategy in the 
case of our study. 

Corrections for multiple testing 

In genome-wide association mapping studies with n 
molecular markers, the same statistical test is performed n 



Table 3 Single nucleotide polymorphism (SNP) marker loci significantly associated with flowering time (FT), northern corn leaf blight (NCLB), and FT corrected 
NCLB (NCLBfr) resistance in the different heterotic pools as identified by single marker analysis, pc is the proportion of the explained genotypic variance and SSS 
is the Stiff Stalk heterotic pool 





Marker 




Chr. 


Position 


Position 




Allele 


Effect 


fir- 

HG 


Trait Pool 


locus 


Gene 


bin 




(bo) 


P value 


1/2 


Allele 1-2 


(°/o) 


FT Flint 


M 0004Q4R6 


LE0021 4 


8 05 


1 97 92 


14S0R47S0 


4 3e-06** 




-2 73 


1 52 




M 000471 

iVI_UUUt-/ I OJ 


LE001 73 


9 02 


75 54 


1 R3970Q4 


5 2e-06** 


A/T 

AV I 


-3 61 


0 00 




jll 1 lUlldl ItrUUb ML 
















1 87 


1 anractor 
Ld 1 ILdo lei 


M 0004S06? 




3 07 


346 79 


1 Q^RI ^R4? 


1 5e-05* 


C/T 


2 83 


7 48 




M 0004S063 




3 07 


346 79 


1 SR4? 


1 5e-05* 


T/C 


2 83 


7 48 




M_00048149 


LE00126 


8.05 


187.40 


128429853 


5.8e-06** 


G/A 


3.15 


10.20 




M_00049487 


LE00214 


8.05 


197.92 


145084209 


1 .9e-06** 


C/A 


-4.20 


13.64 




Simultaneous fit 
















22.99 


SSS 


M.00048750 


LE00097 


2.02 


95.07 


5820265 


2.4e-06** 


G/A 


13.62 


14.80 




M_00047756 


LE00008 


5.03 


186.65 


67510540 


5.6e-06** 


C/T 


-6.25 


16.76 




Simultaneous fit 
















21.35 


lodent 


M.00039634 


AY1 06491 


8.03 


115.95 


65244298 


1 .7e-05* 


G/A 


3.57 


18.18 




M .00044984 


CL4016 


8.05 


191.34 


131678990 


1.1e-05* 


T/C 


1.95 


14.82 




(VL00044985 


CL4016 


8.05 


191.34 


131678953 


1 .2e-05* 


A/G 


1.98 


15.18 




M_00040388 


AY1 06357 


8.05 


192.10 


134066305 


1 .2e-06** 


T/C 


2.51 


17.14 




M.00042146 


AY1 09558 


8.05 


192.10 


135061056 


1 .3e-05* 


T/C 


2.30 


18.77 




M.00046254 


HDT102 


8.05 


192.26 


136131540 


1 .4e-05* 


T/A 


-2.16 


17.55 




Simultaneous fit 
















25.50 



Table 3 Single nucleotide polymorphism (SNP) marker loci significantly associated with flowering time (FT), northern corn leaf blight (NCLB), and FT corrected 
NCLB (NCLB/t) resistance in the different heterotic pools as identified by single marker analysis, pc is the proportion of the explained genotypic variance and SSS 
is the Stiff Stalk heterotic pool (Continued) 



NCLB 


SSS 


M_00046267 


HAM101 


2.08 


322.28 


212539314 


1 .2e-06** 


C/J 


1.09 


9.38 






M_00046268 


HAM101 


2.08 


322.28 


21253921 1 


7.3e-07** 


T/C 


1.10 


1 1.30 






Simultaneous fit 
















y.oo 




lodent 


M_00048759 


LE00099 


3.05 


227.44 


160670670 


1 .4e-05* 


A/G 


1.78 


6.05 






M_00044733 


AZM452718 


5.03 


187.81 


69119810 


5e-06** 


G/A 


-0.90 


8.49 






M_00046712 


AY1 03770 


9.03 


111.71 


99416354 


6e-06** 


G/C 


1.25 


10.33 






M.00046713 


AY1 03770 


9.03 


111.71 


99416416 


9.7e-07** 


T/G 


-1.32 


11.49 






M_00040630 


AY1 05678 


9.05 


151.55 


130918789 


9.1e-06* 


C/J 


-0.81 


10.02 






M_00040631 


AY1 05678 


9.05 


151.55 


130918887 


1.1e-05* 


T/C 


-0.79 


9.08 






Simultaneous fit 
















28.94 


NCLB Fr 


SSS 


M.00046331 


HAM101 


2.08 


322.10 


212537417 


5.9e-06** 


A^ 


-1.01 


0.00 






Simultaneous fit 
















0.00 




lodent 


M_00040577 


AY1 05760 


9.03 


99.89 


50762538 


1.3e-05* 


T/G 


0.90 


12.87 






M_00040630 


AY1 05678 


9.05 


151.55 


130918789 


5e-07** 


C/J 


-0.97 


18.89 






M .00040631 


AY1 05678 


9.05 


151.55 


130918887 


5.3e-07** 


T/C 


-0.96 


15.99 






Simultaneous fit 
















23.20 



3" < 

-o ^_ 

H 

SI 

S. a 

n ' 
ro Do 



< DO 

K % 

NJ O 
VD 

\ NJ 

NJ — 1 



"Significant at P<0.05 with amplicon-wise Bonferroni correction. 
""Significant at P<0.05 with Bonferroni correction. 
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Table 4 Simultaneous fit of single nucleotide polymorphism (SNP) markers identified by three rounds of multiple 
forward regression to be significantly (a=0.05, amplicon wise Bonferroni correction) associated with flowering time (FT), 
northern corn leaf blight (NCLB), and FT corrected NCLB (NCLBn-) resistance for the entire set of 1 487 maize inbred lines. 
Pg is the proportion of the explained genotypic variance 
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the three SNPs identified with the former method for 
FT explained a higher proportion of the genetic vari- 
ance than those identified by using the latter method. 
These results corroborate the appropriateness of multiple 
forward regression procedures for association analyses. 

For NCLB and NCLB Fr , the SNPs identified by this 
approach, however, explained in a simultaneous fit a lower 
proportion of the genetic variance than the SNPs identi- 
fied by the single marker analysis (Table 2; 4). This finding 
might be explained by the significance levels applied dur- 
ing the single marker analysis, which are not directly 
comparable to those of the multiple forward regres- 
sion. Furthermore, since multiple forward regression for 
mixed-model approaches is computationally demanding, 
we were able to perform only three selection steps result- 
ing in a maximum of three selected SNP markers and this 
provides another explanation for our findings. Therefore, 
in order to take full advantage of multiple forward regres- 
sion, more efficient computation algorithms are required. 

Identified SNP-phenotype associations 

In the entire germplasm set, the population structure 
explained 21% of the genetic variation of FT. This find- 
ing suggested that sufficient genetic variation remains for 
detection of SNP-FT associations. For FT, we observed 
for the single marker analysis a strong P value peak on 
bin 8.05, which comprised seven SNPs from four genes 
(Figure 3). Furthermore, this region was identifed by the 
multiple forward regression approach (Table 4). Earlier 
studies recognized this chromosomal region as a hot spot 
for FT QTLs and genes ([34,35] and references cited in 
there). The physical map positions of the significantly 
associated SNPs ranged from 128 429 853 to 145 084 250 



bp. The observed P value peak at about 130 Mbp is in 
proximity to Vgtl, a non-coding sequence regulating the 
flowering time gene ZmRap2.7 [36]. However, the close 
consideration of that region revealed an additional P value 
peak at about 145 Mbp (Additional file 5: Figure S5). This 
observation might suggest that in addition to Vgtl a sec- 
ond gene could be involved in FT control in this region. 
However, the region identified in our study does not corre- 
spond to Vgt2 [37], as the latter FT QTL has been mapped 
to the other side of Vgtl towards the top of the chro- 
mosome. Since the average linkage disequilibrium (LD) 
among the significantly associated SNPs in this region was 
high, these SNPs are not necessarily located in the causal 
genes, but the association might be due to SNPs in strong 
LD with polymorphisms in the causal genes (cf. [38]). This, 
however, requires further research. 

Another gene that is frequently proposed to contribute 
to variation of FT in maize is Dwarf8 (D8) (e.g. [39]). Even 
though our study included six SNPs from D8, we did not 
find any significant association in bin 1.10 where D8 is 
located. Our observation is in accordance with the results 
of [40], who observed no significant association for D8 
in a set of European maize inbred lines. These findings 
might be explained by a correlation of the allele frequen- 
cies of polymorphisms in D8 with population structure 
in the examined germplasm. When correcting for pop- 
ulation structure, it will be impossible to identify such 
polymorphisms in association analyses [40] . 

In addition to the SNPs from the Vgtl region, we iden- 
tified based on the multiple forward regression approach 
a SNP from bin 1.07 to be significantly associated with FT 
(Table 4). This SNP might be located in the QTL (near 
SSR umcl833) upstream of D8 detected in a meta-analysis 
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Figure 4 Epistatic networks for flowering time (FT; red), northern corn leaf blight (NCLB; green), and FT corrected NCLB (blue). The 

markers showing main effects in the single marker analysis and the multiple forward regression are framed with the respective colors. 



[34] and appears in a P-type R2R3 Myb transcription fac- 
tor. Since various transcription factors such as LHY [41] 
or CCA1 [42] are known to regulate FT in model species, 
our finding might suggest that this gene is functionally 
involved in FT regulation of maize. 

In conclusion, we observed for FT in maize a very well 
interpretable pattern of SNP associations that is in har- 
mony with previous genetic analyses. This illustrates that 
data from practical plant breeding programs can be used 
not only to dissect oligogenic [43] but also polygenic 
traits. Furthermore, our findings suggest that the SNP- 
NCLB associations described below might be successfully 
used in marker-assisted selection programs. We identified 



five genome regions (four from single marker analyses, 
one from multiple forward regression) to be significantly 
associated with NCLB resistance (Table 2; 4) which is 
considerably lower than the number of genome regions 
identified by [16]. This finding is most probably due to the 
different significance thresholds and study designs used. 

None of the associations found in our studies was 
located in bin 8.05, where earlier studies mapped the 
qualitative NCLB resistance genes Ht2 and Htnl [7,8]. 
Both these genes have been identified in exotic germplasm 
(Australia, Mexico) and, thus, the resistance alleles might 
be absent in European elite germplasm. Furthermore, con- 
verted inbred lines carrying these introgressed qualitative 
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resistance genes were not included in our study in order 
to prevent any complications with the identification of 
quantitative resistance genes. 

One SNP identified to be significantly associated with 
NCLB was located in bin 2.08 where the qualitative resis- 
tance genes Htl and HtP have been identified [5,6] and 
where a QTL was found by [16]. The physical map posi- 
tions of Htl and this SNP, however, differ by about 10 
Mbp. Nevertheless, the SNP is located within the interval 
made up by the two closest flanking markers of HtP [6] . 
Whether this gene, coding for a nonspecific lipid-transfer 
protein 3 precursor, contributes directly to NCLB resis- 
tance or is in LD with the causal gene warrants further 
research. The same was true for the SNP located in a gene 
of unknown function in bin 6.05, which resides within 
the confidence interval of a QTL affecting the incubation 
period (IP) of NCLB in maize [44] and was located close 
by a QTL affecting NCLB resistance and IP [16]. 

Three SNPs significantly associated with NCLB resis- 
tance were located in bins 5.03, 5.05, and 7.02 (Table 2; 4) 
and in each case, a distinct peak of P values was observed 
(Figure 3). Since all regions have been previously reported 
to contribute to variation in NCLB resistance [15,45,46], 
this finding suggests that the identified SNPs are either 
located in or closely linked with the causal genes (cf. [47]). 
In contrast to the SNP in bin 5.03, which is located in a 
gene of unknown function, the SNP in bin 5.05 is located 
in GPC4, a member of the glyceraldehyde-3-phosphate 
dehydrogenase gene family, which is involved in sugar 
metabolism and shows expression differences upon anaer- 
obiosis as well as heat shock [48]. [16] found also a QTL 
in this region for which a candidate gene was an aldehyde 
dehydrogenase. The SNP in bin 7.02 is located in a DBF1 
like gene, which is a member of the Apetala 2/Ethylene 
transcription factor family [49] and supposed to have a 
function in abiotic stress responses and especially dessica- 
tion tolerance [49,50]. 

Dissecting the correlation between FT and NCLB 

The results of our study indicated that FT and NCLB 
resistance are correlated across all heterotic pools (r = 
0.53, Figure 1). This correlation can be explained by the 
fact that NCLB is a necrophytic disease and, thus, tends to 
progress more rapidly on senescing tissues [20]. However, 
the correlations in the individual heterotic pools were only 
moderate (Flint: 0.33, Lancaster: 0.29, SSS: 0.27, and SSS: 
0.29). This suggests that the overall correlation relies to a 
substantial part on the differences between the heterotic 
pools with respect to FT and NCLB resistance trait val- 
ues (Figure 1). Our observation explains why we found 
neither for the whole set of genotypes (as we accounted 
for population structure) nor in the individual heterotic 
pools any overlap between SNPs associated with FT and 
NCLB (Table 2, 3, 4; Figure 3), and thus, no evidence of 



a pleiotropic effect of FT on NCLB resistance at the SNP 
level which is in accordance with results of [16]. 

Furthermore, we found no collocation between the 
SNPs associated with NCLB and NCLB Fr (Table 2; 
Figure 3) for the whole set of genotypes. This finding sug- 
gested that some of the SNP-NCLB associations outlined 
above for genes involved in heat and drought response 
might be due to an indirect link of these two traits with 
NCLB resistance as well as FT. Indeed, plants sensitive to 
drought stress have a tendency to show early senescense 
symptoms, which, in turn, leads to a higher sensitivity to 
necrotrophic pathogens such as Setosphaeria turcica [20] . 

Nevertheless, we identified SNPs in bins 7.04, 9.03, and 
9.05 to be significantly associated with NCLB^ j. The first 
SNP was located in GID1L2, a gibberellin receptor. Since 
gibberelin plays a role in basal disease resistance of vari- 
ous plant species [51,52], our finding might suggest that 
this gene is functionally involved in NCLB^r resistance 
of maize. 

The other two SNPs in bin 9.03 and 9.05 also signifi- 
cantly associated with NCLB^ j were located in genes with 
unknown function and a Sodium-Hydrogene exchanger, 
respectively, for which no obvious link to NCLB^ r is 
apparent. Nevertheless, we observed for both associa- 
tions distinct P value peaks supporting the hypothesis that 
these genes might be the causal genes or closely linked 
to them. 

Congruency of identified associations across 
heterotic pools 

For FT, we found in three of the four heterotic pools sig- 
nificantly associated SNPs in one (Flint) or two (Lancaster 
and Iodent) of the genes that where identified in the whole 
set of genotypes in the Vgtl region (Table 2, 3; Additional 
file 1: Figure SI). In contrast, in the SSS pool, no signif- 
icant association was detected for these loci. This is in 
accordance with earlier studies, in which QTLs were not 
detected in all examined populations in the Vgtl region 
[38,53,54]. One reason could be that in the SSS pool no LD 
was present in the region between the causal gene and the 
examined polymorphisms. Another explanation might be 
that the early allele of Vgtl does not occur in the SSS pool, 
because it flowers later than the other pools. 

SNPs significantly associated with NCLB and NCLBf j 
resistance were found in the SSS and Iodent pools, but 
not in the Flint and Lancaster ones (Table 3; Additional 
file 2: Figure S2 and Additional file 3: Figure S3). One 
explanation could be the difference in the extent of LD 
between the heterotic pools. The LD decays more rapidly 
in the Flint and Lancaster pools compared to the two 
other pools resulting in a lower genome coverage of 13 
and 48 % vs. 207 and 121 %. Furthermore, the number 
of markers required to detect associations explaining a 
significant part of the phenotypic variation (17 000 and 
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65 000; respectively) in the Flint and Lancaster pools 
is higher than the number of SNPs actually available 
[29]. This could limit the power to detect associations 
for NCLB and NCLB^r resistance in these two pools, 
whereas the number of required markers for the SSS and 
Iodent pools (4 000 and 7 000; respectively) is predicted to 
be sufficient. 

In addition to the above described reasons for the 
imperfect congruency of the identified associations across 
heterotic pools are on one side sampling effects [55] but 
on the other side also epistatic interactions. Therefore, 
we searched for epistatic interactions between the signifi- 
cant SNPs identified in the whole set of genotypes and all 
the other markers. For FT and NCLB, highly significant 
epistatic interactions were detected (Figure 4) suggest- 
ing that epistasis contributes to the imperfect congruency 
of identified associations across different heterotic pools. 
This was even more important for NCLB, for which 
the epistatic interactions between markers explained as 
much genetic variation as their main effects (Additional 
file 4: Figure S4). These results are contradictory to the 
results of [16], who didn't find significant epistatic inter- 
actions between QTL markers and the others. The fact 
that elite breeding material was examined in our study, 
which has undergone a long process of selection, whereas 
the NAM population consists of multiple connected 
recombinant inbred line populations, could explain this 
difference. 

Relevance of the identified associations for 
practical breeding 

The significant SNP-FT associations identified in our 
study explained about 15% of the genetic variance 
(Table 2). This value is much lower than the value reported 
by [35]. This difference is due to the fact that they used 
(i) a stepwise forward regression, (ii) segregating popula- 
tions, and (iii) a total of 5 000 genotypes, which increase 
the power of QTL detection. In contrast to FT, the 
associations identified for NCLB resistance in our study 
explained only about 5% of the genetic variance (Table 2). 
This finding clearly suggests that the genetic architec- 
ture of NCLB has a higher genetic complexity than FT 
and, therefore, phenotypic but also marker-assisted selec- 
tion will result in a lower gain of selection for the former 
than the latter. Nevertheless, for breeding applications, it 
seems more interesting to concentrate on NCLB^ j rather 
than NCLB, because the former is corrected for FT, the 
detected SNPs explain even a higher proportion of the 
genetic variance compared to the latter, and the correla- 
tion with population structure is lower for the former than 
the latter. 

The proportion of the explained genetic variance was 
generally much higher in the individual pools than in the 
entire germplasm set (Table 2, 3). Partly, this might be due 



to the reduced sample size leading to the overestimation 
of the allele effects and the explained genetic variance [56]. 
However, as the individual heterotic pools still comprise 
almost 400 genotypes, this overestimation is expected 
to be only small. More likely, our observation can be 
explained by different loci contributing to the varia- 
tion of the examined traits in the individual heterotic 
pools (Table 3). Another explaination could also be the 
epistatic intereactions which importance differs among 
the heterotic pools. Finally, genome structure differences 
among the heterotic pools such as copy number or pres- 
ence/absence variants [57] can explain our observation. 
Our finding suggests that despite association analysis 
across heterotic pools might be relevant for some traits 
to unravel the genetic architecture, marker-assisted selec- 
tion within the individual heterotic pools, as praticed by 
plant breeders, is more promising than across heterotic 
pools. 

Although we observed for FT highly significant epistatic 
interactions, these explained only a low proportion of the 
genetic variance compared to the main effects and, there- 
fore, might be disregarded in marker-assisted selection for 
this trait. However, this was not true for NCLB as the 
epistatic interaction explained partly a higher proportion 
of the genetic variance than the main effects. Thus, tak- 
ing epistasis into account for this trait should increase the 
efficiency of marker-assisted selection (Additional file 4: 
Figure S4). 

Conclusions 

We observed for FT, a trait for which already various 
genetic analyses in maize have been performed, a very well 
interpretable pattern of SNP associations, suggesting that 
data from practical plant breeding programs can be used 
to dissect polygenic traits. Furthermore, we described 
SNPs associated with NCLB and NCLB^ r resistance that 
are located in genes for which a direct link to the trait 
is discernable or which are located in bins of the maize 
genome for which previously QTLs have been reported. 
Some of the SNPs showed significant epistatic interactions 
with markers from the genetic background. The observa- 
tion that the listed SNPs and their epistatic interactions 
explained in the entire germplasm set about 10% and in 
the individual heterotic pools up to 30% of the genetic 
variance suggest that significant progress towards improv- 
ing the resistance of maize against NCLB by marker- 
assisted selection is possible with these markers, without 
much compromising by a late flowering time. Further- 
more, these regions are interesting for further research to 
understand the mechanisms of resistance to NCLB and 
diseases in general, because some of the genes identified 
were not annotated so far for these functions. However, as 
association mapping provides only statistical, i.e., indirect 
evidence for the function of the identified gene [58], a 
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direct proof of the function of the identified alleles is 
still necessary. 

Methods 

Plant materials, field experiments 

Our investigation was based on a set of 4 149 maize 
inbred lines representing elite European and North Amer- 
ican germplasm. The inbred lines are proprietary to the 
plant breeding company Limagrain (France) and were 
assigned by breeders to four heterotic pools, namely Flint, 
Lancaster, SSS, and Iodent 

In the years 2000 to 2009, these genotypes were evalu- 
ated for their perse performance in routine plant breeding 
trials, at different numbers of locations (2-7), with dif- 
ferent experimental designs (randomised complete block 
design, nested design, etc.) and numbers of replicates 
(1-3). The experiments were either naturally infested or 
artificially infested with Setosphaeria turcica according 
to standard protocols [59]. All entries were evaluated for 
FT and NCLB resistance. FT was recorded in number of 
days to silking after June 1. NCLB was rated on a scale 
from 1 (sensitive) to 9 (resistant) at the level of individual 
plots. 

Molecular marker assays 

A subset of 1 487 inbred lines randomly selected from the 
phenotyped inbreds regarding FT and NCLB were ana- 
lyzed with 359 SSR and 8 244 SNP markers (for details 
see [60]). The SSRs were selected over years by Limagrain 
with respect to their polymorphism information content 
value [61] in various sets of maize inbreds. The SNPs were 
discovered by sequencing 2 973 amplicons in a develop- 
ment set of 30 diverse maize inbreds. From these, SNPs 
which showed an Illumina designability score > 0.4 and 
were not in complete LD in the development set, were 
selected for genotyping the 1 487 lines. The proportion 
of missing data was 5.1% for the SSRs and 2.7% for the 
SNPs. The amplicons had an average size of 477 bp and 
contained on average three SNPs. 

All markers were mapped in the IBM population [62]. 
Chromosomes 1 to 10 carried 59, 42, 41, 34, 36, 31, 36, 31, 
27, and 22 of the SSR markers, respectively. In addition, 1 
456, 858, 902, 898, 1 002, 633, 578, 632, 699, and 586 of the 
SNPs were mapped to chromosomes 1 to 10, respectively. 
The total map length was 4 265 cM for the SSRs and 4 378 
cM for the SNPs. The physical positions of the markers 
were extract from Zea mays Genome Browser - Release 
2.0. 

Genotyping of the SSRs was performed by Limagrain 
Verneuil Holding (Riom, France) using standard proto- 
cols. Genotyping of the SNPs was performed by Bio- 
gemma (Clermont-Ferrand, France) using an Illumina 
Infinium iSelect chip. 



Statistical analyses 
Phenotypic data analyses 

Phenotypic data were analysed based on the following 
mixed model: 

Jijklm = V+gi + Uj + gi * Uj + Ojt jk 

+fykbjkl + Pjkirjklm + ejttm, 

where yijkim is the phenotypic observation for the i th maize 
inbred line at the j th environment (year-location combi- 
nation) in the m th replicate of the I th block in the k th 
trial, /jL the intercept, gi the genetic effect of the i th maize 
inbred line, Uj the effect of the j th environment, gi * Uj the 
genotype-by-environment interaction, the effect of the 
k th trial in the j th environment, bjM the effect of the I th 
block in the h th trial of the j th environment, rjkim the effect 
of the m th replicate of the I th block in the h th trial of the 
jth environment, and ejM m the residual. Oj was a dummy 
covariate of value 1 in environments with several trials and 
of value 0 alternatively, a dummy covariate of value 1 in 
environments with several trials and blocks and of value 0 
alternatively, and pjki a dummy covariate of value 1 in envi- 
ronments with several trials, blocks, and replicates and of 
value 0 alternatively. 

Our study was based on data from 10 years and 23 
locations spread over Europe, resulting in a total of 45 
environments and, thus, the environmental factor was 
regarded as random. Error variances were assumed to be 
heterogeneous among environments. For calculating the 
adjusted entry mean Mi for each of the 4 149 inbred lines 
across all trials, we regarded as fixed and all other effects 
as random. 

For estimation of variance components, except /x, all 
effects including gi were regarded as random. Heritability 
on an entry mean basis was calculated for the pheno- 
typed and genotyped inbred lines according to [63] for 
unbalanced breeding trials. 

NCLBp r was calculated according to [64]. A regression 
curve of NCLB against FT was computed (Figure 3). The 
vertical distance of an inbreds adjusted entry mean to the 
regression curve represented its NCLB^ t resistance value. 
Negative values indicated susceptible plants and positive 
values resistant plants. 

Variance components were determined by the REML 
method. The mixed model analyses were performed with 
ASREML release 2.0 [65]. All other analyses were per- 
formed using the software R [66] . 

Association analyses 

Single marker analysis: In the second step of our approach, 
we used the adjusted entry means for FT, NCLB, and 
NCLB^ r to test their associations with each of the 8 244 
SNP markers, using the QK method [67] : 
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Mip = fi + m p + gf + ^2 D iu v u + tip, (2) 

u=l 

where M[ p is the adjusted entry mean of inbred i carry- 
ing the p th allele, m p is the effect of the p th allele of the 
SNP marker under consideration, gf the residual genetic 
effect of the i th entry, v u the effect of the u th column of 
the population structure matrix D, and the residual 
The variance-covariance matrix of the vector of random 
effects g* = {#*,... ^1487} was assumed to be Var(g*) = 
2Ko£, where K was a 1487 x 1487 matrix of kinship 
coefficients that define the degree of genetic covariance 
between all pairs of inbreds, and is the genetic vari- 
ance estimated by REML. The variance-covariance matrix 
of the vector of errors was assumed to be Var(e) = I<r 6 2 . 

The population structure matrix Q was calculated based 
on SSR markers using the software STRUCTURE [68] as 
described in detail by [60]. Per definition, the z+l columns 
of the Q matrix add up to one. Thus, only the first z 
columns were used as D matrix in our study, to achieve 
linear independence and, thus, avoid singularities. The 
kinship matrix K was calculated as described by [27] based 
on the SSR markers. In addition to the above described 
QK approach, we also examined other models: ANOVA, 
Q, K and Kt (Additional file 6: Figure S6) for SNP markers 
but also for SSR markers [28]. In order to compare these 
different association mapping methods, expected P val- 
ues were calculated and the MSD between observed and 
expected P values of all marker loci was then calculated as 
a measure of the deviation of the observed P values from 
the uniform distribution [25]. 

Based on the Wald statistics, we performed a test for 
the presence of significant {a = 0.05) SNP effects for 
each of the three traits. We dealt with the multiple testing 
problem by applying a Bonferroni and amplicon number 
based Bonferroni correction [31]. For the former, we used 
the total number of SNP markers to calculate the Bon- 
ferroni correction, whereas, for the latter, the correction 
was calculated using the number of amplicons from which 
the examined SNPs were derived. The proportion of the 
genetic variance explained by the significant SNPs was 
computed based on the relative reduction in genetic vari- 
ance when the SNPs were added to the model [69]. Simi- 
larly, the proportion of genetic variance explained by the 
D matrix was calculated. Negative values were set to zero. 

Heterotic pools: Similarly to the analyses conducted 
for the whole set of inbred lines, single marker analyses 
were conducted for each of the four heterotic pools. The 
same model was applied, except that no D matrix was 
considered in this case, as the population structure within 
the heterotic pools was modelled by the kinship matrix K. 

Multiple forward regression: In order to take into 
account the LD between SNPs, we used in addition to 



the single marker analysis a multiple forward regression 
approach to identify, based on the above described QK 
model, those marker combinations which explain best the 
genotypic variation. A P-to-enter criterion was used. We 
added the SNP with the lowest P value in the single marker 
analysis (if significant according to the amplicon based 
Bonferroni correction), as fixed cofactor in the analyses, 
when examining all remaining SNP markers for their asso- 
ciation with the phenotype. For each of the three traits, 
this prodedure was repeated due to the high computa- 
tional burden only two times and, thus a maximum of 
three SNPs could be selected. 

Detection of epistasis: For each of the three traits, we 
performed a screen for epistatic interactions between the 
significant SNPs from the single marker analysis as well 
as multiple forward regression and all other SNP mark- 
ers. The multiple testing problem was considered using 
the two different Bonferroni corrections. 

The association analyses of SSR markers were per- 
formed with ASREML release 2.0 [65], whereas the asso- 
ciation analyses of SNPs were performed with EMMA 
[70]. 

Additional files 



Additional file 1 : Figure SI . Genome-wide P values for association 
analysis of flowering time within the different hererotic pools (Flint, 
Lancaster, SSS, lodent, respectively). The ten colors represent the ten 
chromosomes. The horizontal, doted and dashed-doted lines correspond 
to a nominal 5% significance threshold with Bonferroni and amplicon-wise 
Bonferroni correction, respectively. Significant P values are represented by 
a star. 

Additional file 2: Figure S2. Genome-wide P values for association 
analysis of northern corn leaf blight resistance within the different 
hererotic pools (Flint, Lancaster, SSS, lodent, respectively). The ten 

colors represent the ten chromosomes. The horizontal, doted and 
dashed-doted lines correspond to a nominal 5% significance threshold 
with Bonferroni and amplicon-wise Bonferroni correction, respectively. 
Significant P values are represented by a star. 

Additional file 3: Figure S3. Genome-wide P values for association 
analysis of flowering time corrected northern corn leaf blight 
resistance within the different hererotic pools (Flint, Lancaster, SSS, 
lodent, respectively). The ten colors represent the ten chromosomes. The 
horizontal, doted and dashed-doted lines correspond to a nominal 5% 
significance threshold with Bonferroni and amplicon-wise Bonferroni 
correction, respectively. Significant P values are represented by a star. 

Additional file 4: Figure S4. Significant epistatic interactions 
between the most significantly associated SNP for flowering time (FT) 
and northern corn leaf blight (NCLB) resistance and all other SNP 
markers for the entire set of 1 487 maize inbred lines, pc is the 

proportion of the explained genotypic variance. 

Additional file 5: Figure S5. P values for association analysis of 
flowering time for the entire set of 1 487 maize inbred lines on 
chromosome 8 in the Vgtl region. Significant P values are represented 
by a star. 

Additional file 6: Figure S6. Deviance of the QK mixed model 
association mapping method applied to northern corn leaf blight 
resistance of the entire germplasm set of the 1487 genotypes 
depending on threshold T. For details, see Materials and Methods. 
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